What Should Be Done About Bias and Misconduct in Clinical Trials?

In recent years, highly publicized litigation and new U.S. Food and Drug Administration (FDA) reporting requirements have given researchers an unprecedented opportunity to “peek inside the statistical black box”4 and examine data on drug product safety and efficacy that were previously unreported in peer-reviewed literature. The resulting proliferation of research on clinical trial conduct (and misconduct) has confirmed what many managed care pharmacy professionals have known for years—physicians often unknowingly rely on suboptimal information in making prescribing decisions. As the above quotations illustrate, among the most troubling problems is the reporting of incomplete or erroneous information in communications targeted to physicians, including clinical trials published in the peerreviewed literature. For example: • A 2008 analysis of new drug application trials approved by the FDA from 2001-2002 found that 22% were unpublished at least 5 years after their completion, and the odds of publication were multiplied by a factor of 4.8 for trials with favorable results, compared with trials with unfavorable results.5 • A 2008 analysis of FDA-registered clinical trials for antidepressants approved from 1987-2004 found that 97% with favorable results, but only 8% with unfavorable results, were published.6 • A 2007 analysis of promotional detailing visits for gabapentin from 1995-1999 found that 46% led to physician intention to prescribe or recommend the drug, and 38% of those visits involved discussions in which the “main message” was at least 1 off-label use.7 Various components of the health care system, including government regulators, journal editors, and pharmaceutical manufacturers, are cobbling together a patchwork of solutions to the problem of inaccurate “evidence” about the efficacy and safety of prescription drugs. But will their efforts be sufficient to encourage What Should Be Done About Bias and Misconduct in Clinical Trials?

"Biased reporting of results from [new drug application] trials is particularly concerning because these journal articles are the only peer reviewed source of information on recently approved drugs for health care providers.…" 1 "[Because of selective reporting of research findings, doctors] end up asking, 'How come these drugs seem to work so well in all these studies, and I'm not getting that response?'" 2 "To me, the remarkable thing is how effective a very brief visit by a drug representative-most often less than five minutes-can be in influencing physicians' choices to use a drug for an unapproved indication." 3 I n recent years, highly publicized litigation and new U.S. Food and Drug Administration (FDA) reporting requirements have given researchers an unprecedented opportunity to "peek inside the statistical black box" 4 and examine data on drug product safety and efficacy that were previously unreported in peer-reviewed literature. The resulting proliferation of research on clinical trial conduct (and misconduct) has confirmed what many managed care pharmacy professionals have known for years-physicians often unknowingly rely on suboptimal information in making prescribing decisions. As the above quotations illustrate, among the most troubling problems is the reporting of incomplete or erroneous information in communications targeted to physicians, including clinical trials published in the peerreviewed literature. For example: • A 2008 analysis of new drug application trials approved by the FDA from 2001-2002 found that 22% were unpublished at least 5 years after their completion, and the odds of publication were multiplied by a factor of 4.8 for trials with favorable results, compared with trials with unfavorable results. 5 • A 2008 analysis of FDA-registered clinical trials for antidepressants approved from 1987-2004 found that 97% with favorable results, but only 8% with unfavorable results, were published. 6 • A 2007 analysis of promotional detailing visits for gabapentin from 1995-1999 found that 46% led to physician intention to prescribe or recommend the drug, and 38% of those visits involved discussions in which the "main message" was at least 1 off-label use. 7 Various components of the health care system, including government regulators, journal editors, and pharmaceutical manufacturers, are cobbling together a patchwork of solutions to the problem of inaccurate "evidence" about the efficacy and safety of prescription drugs. But will their efforts be sufficient to encourage What Should Be Done About Bias and Misconduct in Clinical Trials? Kathleen A. Fairman, MA, and Frederic R. Curtiss, PhD, RPh, CEBS the production and use of high-quality evidence now, when the process of health care reform most requires it? Selective Reporting Can "Mislead Doctors and Patients" 8 In an analysis published in January 2008, Turner et al. identified phase 2 and 3 clinical trial programs for 12 antidepressant agents that had been approved by the FDA between 1987 and 2004. To determine if the trials had been published, the authors searched standard publication databases including PubMed, references in review articles, and the Cochrane Central Register of Controlled Trials. If no trial could be identified, they contacted the study sponsor by certified letter, requesting information about study publication. 6 Turner et al. found a strong association between favorability of results for the study drug and likelihood of publication. Overall, of 74 FDA-registered studies conducted in 12,564 patients, 51 (68.9%) were published, including 37 of 38 studies (97.4%) that the FDA had judged as indicating positive findings for the study drug. In contrast, of 36 studies judged by the FDA to have produced negative (n = 24) or questionable (n = 12) results, 3 (8.3%) were published with the negative findings presented, 11 (30.6%) were published with a representation that the findings were positive, and 22 (61.1%) were not published. The researchers concluded that "the efficacy of [the antidepressant] drug class is less than would be gleaned from an examination of the published literature alone . . . . As a result of selective reporting, the published literature conveyed an effect size nearly one third larger than the effect size derived from the FDA data," leading physicians to make "inappropriate prescribing decisions." 6 Whittington et al., studying trials of the use of selective serotonin reuptake inhibitors in children and adolescents with depression, had produced similar findings in 2004, concluding that when the unpublished data were considered, "risks could outweigh benefits of these drugs (except fluoxetine) to treat depression in children and young people . . . . Non-publication of trials, for whatever reason, or the omission of important data from published trials, can lead to erroneous recommendations for treatment." 9 In an analysis that examined the reporting of clinical trial data used in new drug applications submitted to the FDA, Lee et al. (September 2008) identified 909 trials supporting 90 new drugs approved between 1998 and 2000, using FDA review documents and the approved drug label. The authors categorized trials as "pivotal" if they were described in both the summary document for drug approval and the "clinical studies" section of the not to publish anything that damages neurontin's marketing success." 15 This particular drug and manufacturer were certainly not unique. Similar issues in clinical trials of other medications have been reported, sometimes anecdotally, sometimes using systematic assessment of corporate documents; 16 the practices of ghostwriting and publication planning are reportedly endemic. 17,18 HARKing and Texas Sharp-Shooting: Unexplained Discrepancies Between Study Protocols and Published Articles The practice of HARKing (hypothesizing after results are known), described by psychologist Norbert Kerr in 1998, 19 has been observed in managed care pharmacy studies in recent years; we have previously described the problem as being akin to "Texas sharp-shooting," in which a gunfighter draws a target around holes that he has already shot in the side of a barn. 20 Although the potential for HARKing in retrospective database analyses is clear, recent evidence suggests that the practice has affected clinical trials as well.
Chan et al. (December 2008) compared study protocols with publications for 70 trials of medical interventions, including 56 drug trials, approved by the Copenhagen and Frederiksberg, Denmark, scientific-ethics committees from January 1994 through December 1995. 21 Among trials in which key elements of study methods were reported in both the protocol and the publication, numerous important discrepancies were identified. For example, the protocol and publication frequently differed in their descriptions of the statistical tests used in the primary outcome analyses (25 of 42 trials, 59.5%), the sample size calculation method (18 of 34 trials, 52.9%), and the methods for handling protocol deviations (19 of 43 trials, 44.2%). In 1 trial, the study protocol indicated that a delta (difference between study groups) of 10% would be considered clinically significant; the publication employed a delta of 6% (10% vs. 16%). Of 25 trials in which subgroup analyses were described in both protocol and publication, all contained discrepant information. This problem is particularly troubling because post hoc subgroup analyses are widely recognized as a key source of misleading and erroneous information in research reporting. 16,20 A broader previous review of the same database and time period, also conducted by Chan et al. (2004), compared study protocols with publications for 102 trials that were published in 122 journal articles. 22 A questionnaire was used to query authors about unreported outcomes, defined as those specified in the protocol but not reported; authors were asked to provide the statistical significance of the findings and the reasons for omitting them from the study report. Major discrepancies between protocol and publication-defined as (a) failing to describe an outcome as primary when it was originally designated as primary, (b) omitting a primary outcome from the study report, (c) introducing a new primary outcome in the study report, or (d) using different drug label. During follow-up of at least 5.5 years after the FDA approval date, 394 of 909 trials (43.3%) were matched to at least 1 publication. Although pivotal trials were much more likely than nonpivotal trials to be published (75.6% vs. 24.1%, adjusted odds ratio [OR] = 5.31, 95% confidence interval [CI] = 3.30-8.55), only 82 of 340 (24.1%) pivotal trials were published in a peer-reviewed journal prior to FDA approval. Only 36.1% of trials with nonsignificant results versus 66.0% of trials with statistically significant results were published (adjusted OR = 3.03, 95% CI = 1.78-5.17). 10 Rising et al. (November 2008) used a similar methodology to analyze all efficacy trials that were included in FDA-approved new drug applications for new molecular entities from 2001-2002. Of 164 trials, 128 (78.0%) were published. In multivariate analyses, the odds of publication were multiplied by 4.8 for trials with favorable primary outcomes for the study drug (compared with unfavorable outcomes, OR = 4.77, 95% CI = 1.33-17.06; P = 0.018) and by 3.4 for trials with active controls (compared with placebo only, OR = 3.37, 95% CI = 1.02-11.22, P = 0.047). 5 The work of Ramsey and Scoggins (January 2009) is especially important to the matter of selective publication because it covers a time period following the implementation of reforms intended to curb the practice of selective reporting. Beginning in approximately 2004 and 2005, the International Committee of Medical Journal Editors (ICMJE) and major journals instituted policies that required registration of clinical trials and submission of trial protocols with study manuscripts. [11][12][13] In 2007, Ramsey and Scoggins used the database "clinicaltrials.gov" to find all trials of antineoplastic medications, identified using 9 search terms specific to cancer (cancer, neoplasm, carcinoma, myeloma, leukemia, lymphoma, melanoma, sarcoma, and mesothelioma). Of 2,028 trials completed or terminated as of September 2007, 17.6% had been published as of December 2007, including 3.4% of terminated trials, 19.5% of completed trials, and 5.9% of industry-sponsored trials. When the sample was restricted to trials registered prior to September 2004, 21.0% of trials were published after more than 3 years of follow-up. 14 A fascinating look at the process that reportedly underlies selective publication was provided in a January 2009 report by Landefeld and Steinman, who were unpaid consultants for the plaintiff (biologist David Franklin) in a lawsuit over the manufacturer's promotional activities for the anticonvulsant gabapentin. 15 Based on a content analysis of corporate documents, which became publicly available as a result of the discovery process in the litigation, study authors reported extensive evidence of ghostwriting and a "publication strategy" for marketing purposes. As part of their review, Landefeld and Steinman identified written communications indicating that the publication of a multicenter, placebo-controlled trial, which had produced a finding that gabapentin had no effect on the study's primary outcome, would be delayed because "we [Parke-Davis employees] should take care about whether clinical trials of these medications accurately capture their benefits and risks versus those of other treatments. 27,28,30,31 The problem of questionable comparator selection is by no means limited to studies of COX-2 inhibitors. For example, a 2006 review and commentary by Psaty et al. concluded that 3 of 4 large industry-sponsored antihypertensive medication trials conducted between 2002 and 2004 had used atenolol as the comparator drug (versus losartan, verapamil, and amlodipine), even though a low-dose thiazide diuretic would have been a more clinically appropriate comparator. 32 The problem of inappropriate comparators is also not limited to drug trials. In surgical trials, problems in allocation concealment and "performance bias"the use of more experienced surgeons to perform experimental procedures, often out of appropriate concerns for patient safetyhave been linked to the likelihood of finding positive results for newer procedures. 33 For example, a trial of laparoscopic versus open cholecystectomy for acute cholecystitis, reported in 1998, randomized patients to laparoscopy conducted by the experienced study investigators or to the comparison surgery-cholecystectomy performed mostly by senior surgical residents. 34 Referring to studies employing inappropriate comparators as "commercial speech rather than compelling science," Psaty et al. observed in 2006 that study sponsors sometimes choose comparators "in a way that is likely, a priori, to cast their agents in the best light." 32 Moreover, Psaty et al. argued that problems in comparator selection have far-reaching implications for the base of available knowledge about the risks and benefits of treatments and, ultimately, for public health: "For clinical trials to have value and to merit wide dissemination, they must focus on the key questions in the field. Industry-funded trials that compare their products as first-line drug therapy with inferior agents do not provide useful evidence about either first-line or second-line drug treatment." 32

Registry Databases of Clinical Trials-Available But Assailable
In response to concerns about incomplete reports of research protocols and results, the 2007 FDA Amendments Act expanded the authority of the FDA to monitor drug and device safety and required that the "basic results" of clinical trials be available in a publicly accessible database. 10,35 "Basic results" should include "demographic and baseline characteristics" of the sample and "a table of values" and "results of scientifically appropriate tests of the statistical significance" for "each of the primary and secondary outcome measures for each arm of the clinical trial" at the time of registration of the study with the FDA. 35 Lee et al., evaluating new drug applications included in the database from 1998-2000, found the information to be "variable in detail and content, and not an adequate substitute for full publication in the medical literature." 10 Similar concerns were expressed by the outcomes in the protocol and publication to calculate statistical power-were identified in 51 (62.2%) of 82 trials that reported primary outcomes. Only 48% of authors responded to the survey; 86% of respondents initially denied having unreported outcomes. The most common reasons provided for not reporting pre-specified outcomes in the final publication were lack of statistical or clinical significance and journal space restrictions.

No Medal for MEDAL-Choice of Inappropriate Comparator Drug
In the MEDAL (Multinational Etoricoxib and Diclofenac Arthritis Long-term) trial (2006), 24,913 patients with osteoarthritis and 9,787 patients with rheumatoid arthritis were randomly assigned to receive etoricoxib (either 60 mg or 90 mg per day) or diclofenac (150 mg per day) and were followed for a mean of 18 months. 23 The risk of thrombotic cardiovascular events was similar for the 2 drugs in the intention-to-treat analysis with a hazard ratio (HR) of 1.05 (95% CI = 0.93-1. 19) for etoricoxib compared with diclofenac. There was also no difference in complicated upper gastrointestinal events (0.30 events per 100 person-years for etoricoxib vs. 0.32 for diclofenac).
Psaty and Weiss point out in an editorial (2007) that these outcomes were predictable, since the cyclooxygenase 2 (COX-2) selectivity of diclofenac is similar to that of celecoxib. 24 These editorialists questioned the MEDAL report's stated reason for choosing diclofenac as the comparator-because it is "the most widely prescribed NSAID [nonsteroidal anti-inflammatory drug] in the world"-observing that in the United States, diclofenac is not the most widely used NSAID. The FDA's Arthritis and Drug Safety and Risk Management Advisory Committees recommended in February 2005 that naproxen, not diclofenac, was the preferred comparator for trials of new nonselective NSAIDs and COX-2 inhibitors. 25 As evidence for their concerns about the MEDAL trial's comparator group, Psaty and Weiss cited a 2006 meta-analysis of 121 placebo-controlled trials of COX-2 inhibitors, which showed a relative risk (RR) of vascular events of 0.92 (95% CI = 0.81-1.05) for COX-2s when diclofenac was used as the comparator (26 randomized controlled trials [RCTs]), and 1.57 (95% CI = 1.21-2.03) when naproxen was used as the comparator (42 RCTs). 26 Not surprisingly, given its previous recommendation that naproxen was the comparator of choice, in April 2007 the FDA's Arthritis Drugs Advisory Committee voted 20 to 1 against approval of etoricoxib (Arcoxia), in part based on concerns about the use of diclofenac as the comparator drug. 27,28 Psaty and Weiss put into context the public health concerns that arise when inappropriate comparator drugs are used in clinical trials, pointing to market considerations 24 such as resurgent interest in the approval and marketing of COX-2 inhibitors by pharmaceutical manufacturers in the United States beginning in early 2007. 29 However, the road to market expansion for COX-2 inhibitors now appears to be uphill, partly because of concerns decisions in routine clinical practice today. First, much published information is biased, suggesting that drugs are more efficacious than they actually are. Second, even when clinical trial results are published, they are often not available to practitioners until long after FDA approval of new drugs, irrespective of the timing of direct-to-consumer advertising that is prompting consumers to "ask their doctors about" them. 37 Third, as Turner et al. have pointed out, the problem of selective reporting becomes self-perpetuating. When the "known" effect sizes for a drug are inflated and researchers use those effect sizes in power calculations, subsequent work is statistically underpowered. Thus, future studies will be less likely to detect real clinical effects when they do exist, making those studies less likely to be reported in the peer-reviewed literature. 6

Interpretations of Current Evidence About Clinical Trial Misconduct-No Dearth of Opinions
Given growing public concern about whether drug safety issues are adequately addressed during the drug approval process, 38,39 it is possible that recent high-impact FDA decisions like those made about COX-2 inhibitors and, more recently, the application of black-box warnings to oral antidiabetic medications 40,41 will prompt changes-whether initiated by or imposed upon the pharmaceutical industry-in the methods used to design and report clinical trials. Not surprisingly, there is considerable debate about the changes that are most appropriate or likely to be effective.
More drastic proposals include Landefeld and Steinman's call for "independent public funding of peer-reviewed pharmaceutical research through a National Institute for Pharmaceutical Research that might be funded by a tax on all drug sales" 15 and Chan's call for complete access to regulatory agency submissions and study protocols, including data that are currently redacted from databases as proprietary to study sponsors. 10,36,42 Others have argued that the FDA should dictate trial protocols 24 or that an independent commission or public-private partnership between the pharmaceutical industry, consumer groups, and federal government agencies (e.g., the Department of Health and Human Services, Department of Veterans Affairs, and the Department of Defense [DoD]) should prioritize topics to be studied and select appropriate comparators. 24,43 More modest (and in our view, more realistic) proposals include the application of stringent reporting standards to clinical trial registration databases, including the requirement that study protocols adhere to Consolidated Standards of Reporting Trials (CONSORT) standards, 10 and use of an expanded and more detailed list of data elements to be included in publicly available databases. 42 The Standard Protocol Items for Randomized Trials (SPIRIT) initiative is working on "evidence-based recommendations for key information to include in a trial protocol." 42 ICMJE in 2005: "Acceptable completion of data fields [in a clinical trial registry database] is an important concern. It shouldn't be, but it is. Many entries in the publicly accessible clinicaltrials. gov database do not provide meaningful information in some key data fields." 11 However, the data that were the subject of these observations were collected in the years before public disclosure was required; we are not aware of systematic assessments of the quality of more contemporary protocol data.
More recently, an Office of the Inspector General (OIG) review identified deficiencies in FDA oversight and reporting of clinical investigators' financial conflict of interest information. The deficiencies were attributed by the OIG and FDA to several factors, including underreporting of financial information by trial sponsors to the FDA, limitations in the FDA's authority to obtain the information, lack of a comprehensive clinical investigators database, and absence of an FDA protocol to review and abstract financial information. The OIG recommended that financial information be submitted to the FDA during the pre-trial application process rather than in the marketing application process that occurs after clinical trial completion. In opposition to that proposal, the FDA argued that a requirement to collect pre-trial financial information would "take significant additional effort for both industry and FDA" and is inappropriate because "financial interests are only one form of potential bias." 36

Clinical Trial Bias and Misconduct-Past, Present, or Future?
In the sometimes sensationalistic reporting that accompanies news of research misconduct, limitations of the investigation that led to the controversy may go unnoticed. First, although recently published, much of the work that identified selective reporting was based on trials conducted prior to the implementation of requirements at many journals that clinical trials must be registered to be eligible for publication. Even the work of Ramsey and Scoggins covers a time period very shortly after implementation of policies requiring clinical trial registration, when the full effects of these policies might not have been fully realized. Thus, the degree to which current clinical trials are selectively reported is unknown. It is reasonable to expect that the quality of publicly available clinical trial databases will improve as accessibility to those databases is expanded and their contents critically assessed.
Second, as Turner et al. acknowledge, lack of publication could result "from a failure to submit manuscripts on the part of authors and sponsors, from decisions by journal editors and reviewers not to publish, or both." 6 In that regard, Ramsey and Scoggins point out that journal reviewers may be disinterested in nonsignificant trial results that do not appear to change practice or produce new information. 14 Still, recent evidence suggests several troubling deficiencies in the information that is available to physicians making prescribing Processes that encourage transparent interplay between payers, manufacturers, providers and patients may also provide part of the solution. The DoD formulary review process, described in an article elsewhere in this issue, combines detailed and evidencebased economic and clinical review with qualitative assessments of provider and beneficiary opinion. With the exception of pricing data, most relevant information is publicly available on the DoD and TRICARE websites. 44 Such accessibility represents the direction in which the field of drug efficacy and safety review should be moving. Additionally, more attention to instances of "bad" behavior will no doubt help encourage accuracy, completeness, and transparency.
Attention to research misconduct is likely to-and shouldexpand far beyond clinical trials alone. For example, Bell et al. (2006) found evidence of selective reporting in cost-effectiveness studies published between 1976 and 2001; few studies reported a cost per quality-adjusted life year (QALY) of more than $50,000, and the odds of reporting a cost per QALY of less than $20,000 were doubled for industry-sponsored work. 45 We have previously observed that post hoc "Texas sharp-shooting" analyses are particularly easy to perform and therefore especially problematic in retrospective analyses of administrative claims. 20 However, we are aware of no current registry, analogous to clinicaltrials.gov, for retrospective database analyses. A registration requirement for database studies, similar to that enacted by journals for clinical trials in 2004 and 2005, presents a possible direction for the future.
Perhaps the most important factor in the solution to the problems of bias and misconduct in research of all types may be the most difficult to achieve-a commitment by all parties involved to accept, rather than abrogate, personal responsibility for the effects that their decisions have on the lives of patients. A gloomy assessment made in a 2005 letter to the editor of BMJ following the announcement of its requirement to register clinical trials made this point: "If people are dishonest and intent on committing research misconduct, then they will do so-preventing them from deviating from the protocol will simply encourage them to seek other ways to achieve their aims." 46 Although perhaps overly cynical, the letter writer's observation contained an important grain of truth. Researchers are ultimately responsible for the integrity of the work that they produce. Database owners must work to ensure that clinical trial registry information is complete and accurate. Peer reviewers must make the effort to research and identify conceptual and methodological flaws that compromise the internal or external validity of study findings. When peer reviewers identify serious flaws in a manuscript, editors must value their work enough to take it seriously, insist on modification of the study report to address the identified errors, and reject the manuscript in the unfortunate event that the problems are not corrected. As others have pointed out, those who have "placed themselves at risk by volunteering for clinical trials" 6,11 deserve nothing less.

The Best Disinfectants-Sunshine, CONSORT, and Common Sense
Even the most casual observer of the current debate over quality of evidence in health care should acknowledge that its multifaceted nature requires a multifaceted solution. As we have pointed out previously, turning to financial sponsorship as the sole source of poor quality in published research is facile, fails to recognize that not all biases are financially motivated, and ignores the high-quality work that has been produced by some researchers working in for-profit companies. 16 Similar observations could be made about the argument that turning research over to a government agency or academic institution will solve the problem. In today's environment of consulting arrangements and multiple business affiliations, it is unrealistic and naive to expect that all work performed under the aegis of a nonprofit organization or government agency will be unbiased and free of financially motivated decision making. All types of organizations-public, private, and academic-are capable of producing high-quality or low-quality work.
In theory, developing solutions to the problem of inadequate research evidence is in the interest of all involved parties. Pharmaceutical manufacturers want their drugs approved and available for sale throughout the life of the patents. Payers want to encourage the use of appropriate and cost-effective therapies. Patients and their family members increasingly demand access to information about treatment options and are often more than willing to share their opinions and experiences on blogs. Yet, studies of the production of research "evidence" provide a sobering refutation to the view that a natural process of competing self-interests of multiple parties can solve the problems of bias and misconduct. It is clear that when research decisions are made behind closed doors, the results are sometimes more frightening than enlightening.
We continue to beg for assessment of research evidence using currently available tools such as CONSORT. We also would like to see increasing attention given to the quality and specificity of protocol information submitted to the FDA and continuation of the important work of comparing clinical trial protocols to published literature. However, comparison of protocols to final study reports should be tempered by the understanding that good science sometimes requires making changes to a design that proved unrealistic or inappropriate when viewed with "20-20 hindsight." As others have pointed out, 11,12 unavoidable circumstances, such as clinical information published during the conduct of a clinical trial that invalidates the planned outcome measure, sometimes necessitate protocol modification. Not every post hoc methodological change is an attempted manipulation of study results, as any researcher who has encountered an unforeseen problemsuch as an unexpectedly high dropout rate in a clinical trial or a multimillion dollar cost outlier in a retrospective claims database analysis-will attest. However, researchers should be willing to report and explain clearly all post hoc changes.